35 research outputs found

    Expected stochastic occurrences of unread genomic sequence stretches.

    No full text
    <p>The number of expected unread sequences of a minimum length is plotted for a genome of size 100 Mbps and 40 million 32-nt sequence reads. The probability, <i>P</i>, of obtaining an unread sequence of at least length <i>L</i> is equal to the probability of not obtaining any 32-bp sequence fragments that cover a stretch of length <i>L</i>. This is given by , where <i>G</i> is the genome size, <i>S</i> is the sequence read length (32), and <i>n</i> is the number of sequence reads examined. The expected number of deletions is then given approximately by .</p

    Determining parameters for maximally efficient screens.

    No full text
    <p>Graphs of <i>n<sub>act</sub> vs.</i> log<sub>10</sub><i>y</i> are plotted for three different values of <i>W</i>. For <i>n<sub>req</sub></i> = 5,000 (horizontal line), the minimal value of <i>W</i> is 37,642. log<sub>10</sub><i>y<sub>max</sub></i> is indicated. Plots generated by the program Mathematica 5.0 (Wolfram Research).</p

    File structure of the ‘galign’ folder.

    No full text
    <p>Yellow boxes, folders. Blue circles, executable files. Green hexagons, accessory text files. SNP_results contains the output of SNP_search. Alignment_results contains the output of Alignment_tool. Sequence_reads contains the output of Format_convert. Deletion_results contains the output of Deletion_search. The output of Genome_assemble is located in the Genome_sequences folder. A pre-assembled <i>C. elegans</i> genome (version 195) is distributed with the current software package. Feature_locations contains information about exons, introns and intergenic regions (see text) as well as the genetic code table for amino-acid predictions.</p

    The effects of different parameters on the number of haploid genomes screened.

    No full text
    <p>(A–C) Plots assuming all <i>F</i>1 animals are placed together on a single plate. <i>p</i> = 0.75, unless otherwise indicated. All plots were generated using the program Mathematica 5.0 (Wolfram Research). (A) Black line, graph of <i>n<sub>act</sub></i>/<i>N vs.</i> log<sub>10</sub><i>y</i> for <i>N</i> = 1,000. Blue and red lines, graphs of asymptotes and their equations. (B) Graphs of <i>n<sub>act</sub></i>/<i>N vs.</i> log<sub>10</sub><i>y</i>. Different colors indicate <i>n<sub>act</sub></i>/<i>N</i> for different specified values of <i>N</i>. Black line, Poisson approximation, colored lines, exact solutions. Inset, magnification of the graph for the region of log<sub>10</sub><i>y</i> between 0.3 and 0.5. For each value of <i>N</i>, <i>y</i> can only take on values such that 1/(<i>N</i>−1)≤<i>y</i>≤<i>N</i>−1. Furthermore, although for illustration purposes we have drawn the curves as continuous, <i>y</i> is <i>not</i> a continuous variable, and treating it as such only works for large <i>N</i>. This is most obvious for <i>N</i> = 10 where <i>y</i> can only take on the values 1/9, 1/4, 3/7, 2/3, 1, 3/2, 7/3, 4, and 9. (C) Graphs of <i>n<sub>act</sub></i>/<i>N vs.</i> log<sub>10</sub><i>y</i> for varying values of <i>p</i>, as defined in the text. Graphed using the Poisson approximation. (D) Graphs depicting the fractional error incurred when using the Poisson approximation to estimate <i>n<sub>act</sub></i>/<i>N</i> for screens in which one <i>F</i>1 animal is plated per plate. Although graphs are continuous, only integer values of <i>y</i> are relevant. Also note that the smallest allowable value of <i>y</i> is chosen so that at least 1 <i>F</i>2 animal is chosen per plate.</p

    Assessing ‘galign’ predicted polymorphisms by direct sequencing.

    No full text
    <p>N/A, not applicable. Bold words in last column reflect positions at which a sequence change was predicted by ‘galign’.</p>a<p>alteration predicted by Deletion_search; sequenced change is a G-to-C substitution.</p>b<p>galign read both wild-type and mutant sequences here. The mutant sequence was confirmed by sequencing.</p>c<p>alteration predicted by Deletion_search; sequenced change is a C-to-G substitution.</p>d<p>alteration predicted by Deletion_search; sequenced change is a A-to-G substitution.</p>e<p>alteration predicted by Deletion_searc.</p

    General genetic screening scheme in <i>C. elegans</i>.

    No full text
    <p>(A) <i>P</i>0 animals are mutagenized, and allowed to self-fertilize to produce <i>F</i>1 animals. To identify recessive mutations, <i>F</i>1 animals are allowed to self-fertilize to produce the <i>F</i>2 generation. In this paper we consider the case of <i>n F</i>1 animals giving rise to <i>m F</i>2 animals. (B) Plots describing the probability, π(<i>n</i>), that among <i>n F</i>1 animals screened, following mutagenesis by EMS (<i>r</i> = 1,250), will be found at least one <i>F</i>1 animal heterozygous for a loss-of-function mutation in a gene of interest. The parameter <i>a</i> is as defined in the text. The plots were generated by the program Mathematica 5.0 (Wolfram Research), using <i>n</i> as a continuous variable.</p

    ‘galign’ output files.

    No full text
    <p>(A) A portion of a ‘galign’ alignment tool output file indicating the numbers of wild-type and mutant reads at a given position, as well as the corresponding wild-type and mutant sequences displayed in the event that a mutation was detected. (B) A portion of a SNP_search output file is depicted for a search involving exonic sequence substitutions. Position, position with respect to exon start site. Chrom. Pos., position with respect to the indicated chromosome. WT reads, number of wild-type reads at the given position. Mut reads, number of mutant reads at the given position. (C) A portion of a Deletion_search outputfile is depicted for a search involving deletions spanning exons. Start, End, the start and end coordinates of the deletion with respect to the first nucleotide of the indicated exon. Gen. Pos. Start, Gen. Pos. End, start and end coordinates of deletion with respect to the indicated chromosome. The Comments column is used to highlight features indicative of true deletions and insertions.</p

    ‘galign’ alignment algorithm.

    No full text
    <p>A sequence read is divided into three fragments, <i>A</i>, <i>B</i>, and <i>C</i> (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007188#s2" target="_blank">Results</a>). Algorithm starts at START. seq(A), the sequence of fragment <i>A</i>. seq(a), a genomic sequence matching seq(A) and located at position ‘a’ in the genome. a', an alternate genomic location containing seq(A). L, length of sequence read. Yellow boxes, decision nodes. Green boxes, algorithm repeat nodes. Red boxes, algorithm end points.</p

    Optimal <i>F</i>2-to-<i>F</i>1 screening ratios and screen efficiency calculations.

    No full text
    <p>(A) Contour plots examining maximal screen efficiency, ε<i><sub>max</sub></i>, as a function of α and γ, for different values of <i>p</i>, for screens where all <i>F</i>1 animals are plated on one or a small number of plates. Plots generated using the program MatLab (MathWorks). (B) Graphs examining fold increase in work performed as screening ratio (<i>m</i>/<i>n</i>) deviates from its optimal value, for screens where all <i>F</i>1 animals are plated on one or a small number of plates and α/γ = 10. (C) Graphs depicting maximal screen efficiencies as a function of α for screens in which <i>F</i>1 animals are plated individually. In these graphs γ = 1, which is the most common value for this screening mode. (D) Graphs of the optimal <i>F</i>2-to-<i>F</i>1 screening ratios (<i>m</i>/<i>n</i>) for different values of <i>p</i> as functions of α/γ, for screens where all <i>F</i>1 animals are plated on one or a small number of plates. Note that the vertical axis is the natural log of <i>m</i>/<i>n</i> and not the base 10 log. (E) Graphs of the optimal <i>F</i>2-to-<i>F</i>1 screening ratios (<i>m</i>/<i>n</i>) for different values of <i>p</i> as functions of α, for screens in which <i>F</i>1 animals are plated individually. In these graphs γ = 1, which is the most common value for this screening mode.</p

    Algorithm for performing an optimal genetic screen.

    No full text
    <p>Flowchart begins on the top left corner at “START”. All parameters and equations are described and derived in the text. Parameters of relevance are also described in the Glossary portion of the figure. Diamond shapes indicate steps where a choice must be made.</p
    corecore